feature shift
CGRL: Causal-Guided Representation Learning for Graph Out-of-Distribution Generalization
Lu, Bowen, Yang, Liangqiang, Li, Teng
Graph Neural Networks (GNNs) have achieved impressive performance in graph-related tasks. However, they suffer from poor generalization on out-of-distribution (OOD) data, as they tend to learn spurious correlations. Such correlations present a phenomenon that GNNs fail to stably learn the mutual information between prediction representations and ground-truth labels under OOD settings. To address these challenges, we formulate a causal graph starting from the essence of node classification, adopt backdoor adjustment to block non-causal paths, and theoretically derive a lower bound for improving OOD generalization of GNNs. To materialize these insights, we further propose a novel approach integrating causal representation learning and a loss replacement strategy. The former captures node-level causal invariance and reconstructs graph posterior distribution. The latter introduces asymptotic losses of the same order to replace the original losses. Extensive experiments demonstrate the superiority of our method in OOD generalization and effectively alleviating the phenomenon of unstable mutual information learning.
- North America > United States > Virginia (0.04)
- Europe > Austria (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (12 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Virginia (0.04)
- Europe > Hungary (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Virginia (0.04)
- Europe > Austria (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (12 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Virginia (0.04)
- Europe > Hungary (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Structural Alignment Improves Graph Test-Time Adaptation
Hsu, Hans Hao-Hsun, Liu, Shikun, Zhao, Han, Li, Pan
Graph-based learning has achieved remarkable success in domains ranging from recommendation to fraud detection and particle physics by effectively capturing underlying interaction patterns. However, it often struggles to generalize when distribution shifts occur, particularly those involving changes in network connectivity or interaction patterns. Existing approaches designed to mitigate such shifts typically require retraining with full access to source data, rendering them infeasible under strict computational or privacy constraints. To address this limitation, we propose a test-time structural alignment (TSA) algorithm for Graph Test-Time Adaptation (GTTA), a novel method that aligns graph structures during inference without revisiting the source domain. Built upon a theoretically grounded treatment of graph data distribution shifts, TSA integrates three key strategies: an uncertainty-aware neighborhood weighting that accommodates structure shifts, an adaptive balancing of self-node and neighborhood-aggregated representations driven by node representations' signal-to-noise ratio, and a decision boundary refinement that corrects remaining label and feature shifts. Extensive experiments on synthetic and real-world datasets demonstrate that TSA can consistently outperform both non-graph TTA methods and state-of-the-art GTTA baselines.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > China (0.04)
TabFSBench: Tabular Benchmark for Feature Shifts in Open Environment
Cheng, Zi-Jian, Jia, Zi-Yi, Zhou, Zhi, Guo, Lan-Zhe, Li, Yu-Feng
Tabular data is widely utilized in various machine learning tasks. Current tabular learning research predominantly focuses on closed environments, while in real-world applications, open environments are often encountered, where distribution and feature shifts occur, leading to significant degradation in model performance. Previous research has primarily concentrated on mitigating distribution shifts, whereas feature shifts, a distinctive and unexplored challenge of tabular data, have garnered limited attention. To this end, this paper conducts the first comprehensive study on feature shifts in tabular data and introduces the first tabular feature-shift benchmark (TabFSBench). TabFSBench evaluates impacts of four distinct feature-shift scenarios on four tabular model categories across various datasets and assesses the performance of large language models (LLMs) and tabular LLMs in the tabular benchmark for the first time. Our study demonstrates three main observations: (1) most tabular models have the limited applicability in feature-shift scenarios; (2) the shifted feature set importance has a linear relationship with model performance degradation; (3) model performance in closed environments correlates with feature-shift performance. Future research direction is also explored for each observation. TabFSBench is released for public access by using a few lines of Python codes at https://github.com/LAMDASZ-ML/TabFSBench.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Oceania > Australia > New South Wales (0.04)
- Antarctica (0.04)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Optimal probabilistic feature shifts for reclassification in tree ensembles
Blanco, Víctor, Japón, Alberto, Puerto, Justo, Zhang, Peter
In this paper we provide a novel mathematical optimization based methodology to perturb the features of a given observation to be re-classified, by a tree ensemble classification rule, to a certain desired class. The method is based on these facts: the most viable changes for an observation to reach the desired class do not always coincide with the closest distance point (in the feature space) of the target class; individuals put effort on a few number of features to reach the desired class; and each individual is endowed with a probability to change each of its features to a given value, which determines the overall probability of changing to the target class. Putting all together, we provide different methods to find the features where the individuals must exert effort to maximize the probability to reach the target class. Our method also allows us to rank the most important features in the tree-ensemble. The proposed methodology is tested on a real dataset, validating the proposal.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- South America > Peru (0.04)
- South America > Colombia (0.04)
- (4 more...)
- Health & Medicine > Consumer Health (1.00)
- Transportation (0.93)
- Government (0.68)
- Education > Health & Safety > School Nutrition (0.46)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.46)
Pairwise Alignment Improves Graph Domain Adaptation
Liu, Shikun, Zou, Deyu, Zhao, Han, Li, Pan
Graph-based methods, pivotal for label inference over interconnected objects in many real-world applications, often encounter generalization challenges, if the graph used for model training differs significantly from the graph used for testing. This work delves into Graph Domain Adaptation (GDA) to address the unique complexities of distribution shifts over graph data, where interconnected data points experience shifts in features, labels, and in particular, connecting patterns. We propose a novel, theoretically principled method, Pairwise Alignment (Pair-Align) to counter graph structure shift by mitigating conditional structure shift (CSS) and label shift (LS). Pair-Align uses edge weights to recalibrate the influence among neighboring nodes to handle CSS and adjusts the classification loss with label weights to handle LS. Our method demonstrates superior performance in real-world applications, including node classification with region shift in social networks, and the pileup mitigation task in particle colliding experiments. For the first application, we also curate the largest dataset by far for GDA studies. Our method shows strong performance in synthetic and other existing benchmark datasets.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > China > Anhui Province > Hefei (0.04)